EU Employment Analysis (EUROSTAT Data)

Davide Bittelli

December 8, 2024


1 Introduction

This report presents a Principal Component Analysis (PCA) of employment data from Eurostat. The goal is to:

  1. Identify patterns and relationships between countries based on employment statistics for different citizen categories.
  2. Reduce dimensionality for easier visualization and interpretation.
  3. Determine the contributions of each variable (citizen category) to the principal components.
  4. Visualize countries’ similarities and differences in employment patterns.

2 Loading necessary Libraries

Loading the necessary libraries for data manipulation, analysis, and visualization.

# Load necessary libraries
library(tidyverse)     
library(eurostat)      
library(FactoMineR)   
library(factoextra)   
library(ggplot2)
library(dplyr)
library(forcats)
library(tidyverse)
library(lubridate)
library(plotly)
library(scales)  
library(patchwork) 
library(gridExtra)  
library(highcharter)
library(knitr)
library(kableExtra)
library(htmltools)

3 Data Import and Exploration

The lfsa_egan dataset was obtained from EUROSTAT’s official website. This dataset belongs to the Labour Force Survey (LFS) and contains annual employment data categorized by different demographics and economic dimensions.
- Dataset: 496671 observations and 8 variables.
- Types of variables: it mainly contains categorical variables and one numerical variable.

data <- get_eurostat("lfsa_egan")
## indexed 0B in  0s, 0B/sindexed 1.00TB in  0s, 53.24TB/s                                                                              

In order to have a better understanding of the dataset, a brief description for each variable and the corresponding unique values are provided.

# Sample data frame
df <- data.frame(
  Variable = c(
    "TIME_PERIOD", "geo", "citizen", "age", "unit",
    "sex", "freq", "values"
  ),
  Description = c(
    "Sampling year",
    "Geopolitical entity",
    "Country of citizienship",
    "Age ranges",
    "Measurement's unit for employment",
    "Gender",
    "Sampling frequency",
    "Values"
  ),
  Type = c("date", "chr", "chr", "chr", "chr", "chr", "chr", "num"),
  Unique_values = c(
    paste0("`", unique(data$TIME_PERIOD), "`", collapse = " "),
    paste0("`", unique(data$geo), "`", collapse = " "), 
    paste0("`", unique(data$citizen), "`", collapse = " "), 
    paste0("`", unique(data$age), "`", collapse = " "), 
    "`THS_PER` (Thousand persons)", 
    "`F` (Female) `M` (Male) `T` (Total)", 
    "`A` (Annual)", 
    "`numerical values"
  )
)

# Create and style the table
kable(df, "html", escape = FALSE, col.names = c("Variable", "Description", "Type", "Unique_values")) %>%
  kable_styling(
    full_width = FALSE,
    position = "center",
    bootstrap_options = c("striped", "hover"),
    font_size = 14
  ) %>%
  row_spec(0, bold = TRUE) %>%  # Make the header row bold
  column_spec(1, bold = TRUE) %>%   # Make the first column (Variable) bold
  scroll_box(width = "100%", height = "100%", fixed_thead   = FALSE)
Variable Description Type Unique_values
TIME_PERIOD Sampling year date 1995-01-01 1996-01-01 1997-01-01 1998-01-01 1999-01-01 2000-01-01 2001-01-01 2002-01-01 2003-01-01 2004-01-01 2005-01-01 2006-01-01 2007-01-01 2008-01-01 2009-01-01 2010-01-01 2011-01-01 2012-01-01 2013-01-01 2014-01-01 2015-01-01 2016-01-01 2017-01-01 2018-01-01 2019-01-01 2020-01-01 2021-01-01 2022-01-01 2023-01-01
geo Geopolitical entity chr AT BE CH CY CZ DE DK EA20 EE EL ES EU27_2020 FI FR HU IE IS IT LU ME MT NL NO PT RS SE SI SK UK BG HR LT LV MK PL RO BA TR
citizen Country of citizienship chr EU27_2020_FOR FOR NAT NEU27_2020_FOR NRP STLS TOTAL
age Age ranges chr Y15-19 Y15-24 Y15-39 Y15-59 Y15-64 Y15-74 Y20-24 Y20-64 Y25-29 Y25-49 Y25-54 Y25-59 Y25-64 Y25-74 Y30-34 Y35-39 Y40-44 Y40-59 Y40-64 Y45-49 Y50-54 Y50-59 Y50-64 Y50-74 Y55-59 Y55-64 Y60-64 Y65-69 Y65-74 Y70-74 Y_GE15 Y_GE25 Y_GE50 Y_GE65 Y_GE75
unit Measurement’s unit for employment chr THS_PER (Thousand persons)
sex Gender chr F (Female) M (Male) T (Total)
freq Sampling frequency chr A (Annual)
values Values num `numerical values

6 Employment’s comparison by Country and Citizienship Category

A comparative analysis eas conducted to explore how employment levels vary between different countries for specific citizenship categories. EU27_2020 and EA20 have been excluded because they would make the individual countries less comparable between eachother.

For this analysis a subset was created from the initial dataset, including just the Working Population (Y15-64) and making no differentiation for genre (T). Some specific years were selected: - 2000: Represents the early 2000s before major EU enlargements. - 2007: Captures the effects after major EU enlargements. - 2010: Captures employment trends post-2008 financial crisis. - 2023: Reflects the latest available data, including post-COVID-19 recovery.

Heatmaps were generated for each of these years and two comparisons were made.

6.1 Heatmaps Comparison - before and after the EU enlargements (2000 and 2007)

Heatmaps for 2000 and 2007 were generated and compared to capture the differences in employment levels across countries before and after the EU enlargements.

  # Filter data for selected years and only 'Total' for sex and working-age population (15-64)
  selected_years <- c("2000-01-01", "2007-01-01")
  filtered_data <- data %>%
    filter(
      TIME_PERIOD %in% as.Date(selected_years),
      sex == "T",
      age == "Y15-64",
      !geo %in% c("EU27_2020", "EA20")  # Exclude "EU27_2020" and "EA20"
    )
  
  # Group employment by country, citizenship, and year
  country_citizen_summary <- filtered_data %>%
    group_by(geo, citizen, TIME_PERIOD) %>%
    summarise(total_employment = sum(values, na.rm = TRUE))
  
  # Function to create a plotly heatmap for a specific year
  create_plotly_heatmap <- function(year, showlegend = TRUE) {
    df <- country_citizen_summary %>% filter(TIME_PERIOD == as.Date(year))
    
    # Multiply total_employment by 1,000 to reflect real values
    df <- df %>% mutate(total_employment_real = total_employment * 1000)
    
    plot_ly(
      data = df,
      x = ~geo,
      y = ~citizen,
      z = ~total_employment_real,
      type = "heatmap",
      colors = colorRamp(c("#f0f9ff", "#084594")),
      colorbar = list(
        title = "<b>Employment</b>",
        tickfont = list(size = 12),
        titlefont = list(size = 14, family = "Arial")
      ),
      showscale = showlegend,   # Control legend display
      zmin = 0, zmax = 50000000,   # Set consistent color scale limits
      
      # Custom hover text
        hovertemplate = paste(
          "<b>Country:</b> %{x}<br>",
          "<b>Citizenship:</b> %{y}<br>",
          "<b>Total Employment:</b> %{z}<br>",
          "<extra></extra>"  # Removes default trace info
        )
    ) %>%
      layout(
        title = list(
          #text = paste("<b>Employment by Country and Citizenship -", format(as.Date(year), "%Y"), "</b>"),
          font = list(size = 18, family = "Arial"),
          x = 0.5,  # Center the title
          xanchor = "center"
        ),
        xaxis = list(
          title = "<b>Country</b>",
          tickangle = 45,
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        yaxis = list(
          title = "<b>Citizenship Category</b>",
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        margin = list(t = 60, b = 60)  # Add padding to top and bottom margins
      )
  }
  
  # Create plotly heatmaps for each selected year
  interactive_heatmap_2000 <- create_plotly_heatmap("2000-01-01", showlegend = TRUE)
  interactive_heatmap_2007 <- create_plotly_heatmap("2007-01-01", showlegend = FALSE)

  # Combine the interactive heatmaps vertically
  combined_interactive_heatmaps <- subplot(
    interactive_heatmap_2000,
    interactive_heatmap_2007,
    nrows = 2,
    shareX = TRUE,
    shareY = TRUE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      annotations = list(
        list(
          text = "<b>Year: 2000 (before EU enlargements)</b>",
          x = 0.5,
          y = 1.06,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        ),
        list(
          text = "<b>Year: 2007 (after EU enlargements)</b>",
          x = 0.5,
          y = 0.50,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        )
      ),
      title = "<b>Employment Trends by Citizenship Category and Country (2000 vs 2007)</b>",
      margin = list(l = 100, r = 50, t = 80, b = 100),
      titlefont = list(size = 20, family = "Arial", color = "black")
    )
  
  # Display the combined interactive heatmaps
  combined_interactive_heatmaps

The heatmaps for the years 2000 and 2007 show notable variations in employment levels across different EU countries and citizenship categories. The color intensity represents employment levels, with darker blue indicating higher employment.

6.1.1 Year: 2000 (Before EU Enlargements)

  • Employment for nationals (NAT) dominates in most countries, reflecting a labor market primarily consisting of native citizens.
  • Lower employment levels for foreign-born categories (EU27_2020_FOR and NEU27_2020_FOR) are observed, as the European Union had not yet expanded to include countries from Eastern Europe (the 2004 and 2007 enlargements).
  • The employment distribution is more concentrated in Western European countries like Germany (DE), France (FR), and the UK.

6.1.2 Year: 2007 (After EU Enlargements)

  • Noticeable increase in employment for foreign-born categories (EU27_2020_FOR and NEU27_2020_FOR), reflecting the impact of the 2004 EU enlargement (when 10 new countries joined the EU) and anticipation of the 2007 enlargement (including Bulgaria and Romania).
  • Employment growth in countries like Germany (DE), the UK, and Spain (ES) suggests increased labor migration and workforce integration of foreign-born citizens.
  • Some Eastern European countries begin to show increased employment for nationals and foreign-born categories, indicating the effect of free movement of workers within the EU.

6.1.3 Key Insights

  • Impact of EU Enlargement:
    The period between 2000 and 2007 shows the effects of EU enlargement, with a rise in employment for foreign-born categories due to increased migration and labor mobility within the EU.

  • Country-Specific Patterns:

    • Germany (DE) and France (FR) maintain high employment levels across both years, reflecting their strong economies and capacity to integrate foreign workers.
    • Spain (ES) shows increased employment for foreign-born categories, consistent with the construction boom and economic growth before the 2008 financial crisis.
  • Labor Market Integration:
    The increase in employment for foreign-born categories highlights the integration of migrants into the workforce and the importance of policies supporting labor mobility and inclusion.

6.2 Heatmaps Comparison - After 2 Important Crisis (2010 and 2023)

Heatmaps for 2010 and 2023 were compared to capture the differences in employment levels across countries after 2 important crisis: 2008 Financial Crisis and COVID-19 Pandemic.

  # Filter data for selected years and only 'Total' for sex and working-age population (15-64)
  selected_years <- c("2010-01-01", "2023-01-01")
  filtered_data <- data %>%
    filter(
      TIME_PERIOD %in% as.Date(selected_years),
      sex == "T",
      age == "Y15-64",
      !geo %in% c("EU27_2020", "EA20")  # Exclude "EU27_2020" and "EA20"
    )
  
  # Group employment by country, citizenship, and year
  country_citizen_summary <- filtered_data %>%
    group_by(geo, citizen, TIME_PERIOD) %>%
    summarise(total_employment = sum(values, na.rm = TRUE))
  
  # Function to create a plotly heatmap for a specific year
  create_plotly_heatmap <- function(year, showlegend = TRUE) {
    df <- country_citizen_summary %>% filter(TIME_PERIOD == as.Date(year))
    
    # Multiply total_employment by 1,000 to reflect real values
    df <- df %>% mutate(total_employment_real = total_employment * 1000)
    
    plot_ly(
      data = df,
      x = ~geo,
      y = ~citizen,
      z = ~total_employment_real,
      type = "heatmap",
      colors = colorRamp(c("#f0f9ff", "#084594")),
      colorbar = list(
        title = "<b>Employment</b>",
        tickfont = list(size = 12),
        titlefont = list(size = 14, family = "Arial")
      ),
      showscale = showlegend,   # Control legend display
      zmin = 0, zmax = 50000000,   # Set consistent color scale limits
      
      # Custom hover text
        hovertemplate = paste(
          "<b>Country:</b> %{x}<br>",
          "<b>Citizenship:</b> %{y}<br>",
          "<b>Total Employment:</b> %{z}<br>",
          "<extra></extra>"  # Removes default trace info
        )
    ) %>%
      layout(
        title = list(
          #text = paste("<b>Employment by Country and Citizenship -", format(as.Date(year), "%Y"), "</b>"),
          font = list(size = 18, family = "Arial"),
          x = 0.5,  # Center the title
          xanchor = "center"
        ),
        xaxis = list(
          title = "<b>Country</b>",
          tickangle = 45,
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        yaxis = list(
          title = "<b>Citizenship Category</b>",
          tickfont = list(size = 10),
          titlefont = list(size = 14, family = "Arial")
        ),
        margin = list(t = 60, b = 60)  # Add padding to top and bottom margins
      )
  }
  
  # Create plotly heatmaps for each selected year
  interactive_heatmap_2010 <- create_plotly_heatmap("2010-01-01", showlegend = TRUE)
  interactive_heatmap_2023 <- create_plotly_heatmap("2023-01-01", showlegend = FALSE)

  # Combine the interactive heatmaps vertically
  combined_interactive_heatmaps <- subplot(
    interactive_heatmap_2010,
    interactive_heatmap_2023,
    nrows = 2,
    shareX = TRUE,
    shareY = TRUE,
    titleX = TRUE,
    titleY = TRUE
  ) %>%
    layout(
      annotations = list(
        list(
          text = "<b>Year: 2010 (after 2008 Financial Crisis)</b>",
          x = 0.5,
          y = 1.06,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        ),
        list(
          text = "<b>Year: 2023 (after COVID-19 Pandemic)</b>",
          x = 0.5,
          y = 0.50,
          xref = "paper",
          yref = "paper",
          showarrow = FALSE,
          font = list(size = 11, family = "Arial")
        )
      ),
      title = "<b>Employment Trends by Citizenship Category and Country (2010 vs 2023)</b>",
      margin = list(l = 100, r = 50, t = 80, b = 100),
      titlefont = list(size = 20, family = "Arial", color = "black")
    )
  
  # Display the combined interactive heatmaps
  combined_interactive_heatmaps

The heatmaps for the years 2010 and 2023 show variations in employment levels across different EU countries and citizenship categories. The color intensity represents employment levels, with darker blue indicating higher employment.

6.2.1 Year: 2010 (After 2008 Financial Crisis)

  • Employment for nationals (NAT) remains prominent, but declines are observed in some countries due to the impact of the 2008 financial crisis.
  • Lower employment levels for foreign-born categories (EU27_2020_FOR and NEU27_2020_FOR), reflecting the economic downturn’s disproportionate impact on migrant workers.
  • Employment growth is still concentrated in countries like Germany (DE) and France (FR), which were more resilient to the crisis.
  • Southern European countries such as Spain (ES) and Italy (IT) show noticeable declines, consistent with the severe economic challenges they faced during this period.

6.2.2 Year: 2023 (After COVID-19 Pandemic)

  • Recovery in employment levels for both nationals (NAT) and foreign-born categories (EU27_2020_FOR and NEU27_2020_FOR), indicating the labor market rebound after the COVID-19 pandemic.
  • Significant employment growth for foreign-born categories in countries like Germany (DE), France (FR), and Spain (ES), suggesting a return to pre-pandemic trends and increased workforce integration.
  • Eastern European countries show varying employment levels, reflecting ongoing economic adjustments and workforce mobility within the EU.
  • Missing Values appear for some Eastern countries (e.g.Ukraine UK, Turkey TR, North Macedonia MK), probably due to the ongoing conflict between Russia and Ukraine.

6.2.3 Key Insights

  • Impact of Economic Shocks:
    The period between 2010 and 2023 highlights the effects of two major economic shocks:
    • The 2008 financial crisis led to reduced employment, particularly for foreign-born categories.
    • The COVID-19 pandemic caused temporary disruptions but was followed by a recovery in 2023.
  • Country-Specific Patterns:
    • Germany (DE) and France (FR) continue to show high employment levels, demonstrating economic resilience and effective labor policies.
    • Spain (ES) shows employment fluctuations, with a decline post-2008 and recovery post-pandemic.
  • Labor Market Integration:
    The rebound in employment for foreign-born categories in 2023 underscores the importance of migration policies and workforce integration programs in sustaining labor markets.